Loop Splitting for Superscalar Architectures
نویسندگان
چکیده
Program transformations and algorithm modiications are discussed that reduce execution time for iterative methods for solving partial diierential equations on high-performance computers. Techniques typically associated with parallel computers turn out to be essential to obtain optimal performance on current superscalar uniprocessors. The tested programs were written in Fortran77 and run on a single processor KSR-1, SGI Indigo, Cray C90, HP-735, RS-6000, DEC 3000/600 AXP (Alpha) and Sparc workstations. A performance model is developed and used to assess the experimental data.
منابع مشابه
A Comparison of Superscalar and Decoupled Access/Execute Architectures
This paper presents a comparison of superscalar and decoupled access/execute architectures. Both architectures attempt to exploit instruction-level parallelism by issuing multiple instructions per cycle, employing dynamic scheduling to maximize performance. Simulation results are presented for four different configurations, demonstrating that the architectural queues of the decoupled machines p...
متن کاملJava Optimization for Superscalar and Vector Architectures
This paper describes the refactoring of Java code to take advantage of the superscalar and vector architectures available on many modern desktop computers. The unrolling of Java loops is shown to cause some speed-ups for Java code. However, our benchmarks reveal that Java still lags behind vectorized C code. The present state-of-the-art in computer hardware has outpaced the current state of the...
متن کاملPortable Compilation of Vector Expressions for Architectures with Memory Hierarchy
The paper presents a scheme of code generation for vector expressions implemented in the CC] compiler (CC] is a vector ANSI C superset aimed at vector and superscalar architectures). The scheme is based on two well-known optimization techniques { loop invariant code motion and iteration space tiling. The problem of nding the optimal tile size for the imperfectly nested loop system implementing ...
متن کاملEfficiency of microSIMD architectures and index-mapped data for media processors
We show that microSIMD architectures are more efficient for media processing than other parallel architectures like SIMD or MIMD parallel processor architectures, and VLIW or superscalar architectures. We define alternative mappings of data onto subwords, and show that the index mapping is an ideal mapping for achieving maximal subword parallelism with minimal revamping of the original serial l...
متن کاملSoftware pipelining for Jetpipeline architecture
High performance processors based on pipeline processing play an important role in scientific computation. We have proposed a hybrid pipeline architecture named Jetpipeline in our former work. The concept of Jetpipeline comes from the integration of superscalar, VLIW and vector architectures. Jetpipeline has multiple instruction pipelines, which execute multiple instructions like superscalar ar...
متن کامل